Skip to content

Initial Implementation #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Initial Implementation #1

wants to merge 1 commit into from

Conversation

dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Mar 10, 2025

To the reviewers.

This PR is only to present the initial working code in a whole view.

I'm going to make PRs in smaller and more reviewable size.

@dongjoon-hyun dongjoon-hyun marked this pull request as draft March 10, 2025 23:45
dongjoon-hyun added a commit that referenced this pull request Mar 11, 2025
…test `build`

### What changes were proposed in this pull request?

This PR aims to setup `SparkConnect` Swift package structure and CI to test `build`.

Note that this is a subset of the initial implementation.
- #1

### Why are the changes needed?

To setup the initial package structure with CI build test coverage before adding the actual code. Currently, the following two OSs are tested.
- MacOS 15
- Ubuntu 24.04

According to the standard `Swift` package structure,
- https://www.swift.org/getting-started/library-swiftpm/#bootstrapping

this PR adds the following structure for `SparkConnect` package. `SparkConnectError.swift` and `BuilderTests.swift` is added in order to fill the empty directories.
```
$ tree .
.
├── dev
│   └── merge_spark_pr.py
├── LICENSE
├── Package.swift
├── README.md
├── Sources
│   └── SparkConnect
│       └── SparkConnectError.swift
└── Tests
    └── SparkConnectTests
        └── BuilderTests.swift
```

### Does this PR introduce _any_ user-facing change?

No. This is not released yet.

### How was this patch tested?

Pass the CI.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #4 from dongjoon-hyun/SPARK-51461.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
dongjoon-hyun added a commit that referenced this pull request Mar 11, 2025
### What changes were proposed in this pull request?

This PR aims to use `Apache Arrow Swift` 19.0.1. This will be replaced as a dependency when Apache Arrow Swift package is released later.

This is a part of the initial implementation.
- #1

### Why are the changes needed?

Apache Arrow 19.0.1 is the latest version.
- https://arrow.apache.org/release/19.0.1.html (2025-02-16)

Although Apache Arrow 19.0.1 has `Swift` source code,
- For `Arrow` package, we need to change two places to compile in `Swift 6.0` and we need to exclude `ArrowCExporter.swift` and `ArrowCImporter.swift`

```
$ git clone -b apache-arrow-19.0.1 https://github.com/apache/arrow.git
$ cd arrow/swift/Arrow/Sources/Arrow/
$ rm ArrowC*
$ cp * ~/spark-connect-swift/Sources/SparkConnect/
```

```swift
- public enum ArrowTypeId {
+ public enum ArrowTypeId: Sendable {
```

```swift
- public enum Info {
+ public enum Info: Sendable {
```

- For `ArrowFlight` package, we need to update two places to compile in `Swift 6` and use only three files.
  - Flight.pb.swift
  - FlightData.swift
  - FlightDescriptor.swift

```
$ git clone -b apache-arrow-19.0.1 https://github.com/apache/arrow.git
$ cd arrow/swift/ArrowFlight/Sources/ArrowFlight
$ cp Flight.pb.swift FlightData.swift FlightDescriptor.swift ~/spark-connect-swift/Sources/SparkConnect
```

```swift
-  static var allCases: [Arrow_Flight_Protocol_CancelStatus] = [
+  static let allCases: [Arrow_Flight_Protocol_CancelStatus] = [
```

```swift
-  static var allCases: [Arrow_Flight_Protocol_FlightDescriptor.DescriptorType] = [
+  static let allCases: [Arrow_Flight_Protocol_FlightDescriptor.DescriptorType] = [
```

Lastly, `swift format` is applied.
```
$ swift format -i *.swift
```

### Does this PR introduce _any_ user-facing change?

No, this is not released yet.

### How was this patch tested?

Pass the CI.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #6 from dongjoon-hyun/SPARK-51465.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
dongjoon-hyun added a commit that referenced this pull request Mar 12, 2025
### What changes were proposed in this pull request?

This PR aims to add a Swift `SparkConnectClient` actor encapsulating `gRPC` connections which is similar to other language clients.
- Swift (this PR)
```swift
public actor SparkConnectClient {
```

- [Scala](https://github.com/apache/spark/blob/master/sql/connect/common/src/main/scala/org/apache/spark/sql/connect/client/SparkConnectClient.scala):
```scala
private[sql] class SparkConnectClient(
```

- [Python](https://github.com/apache/spark/blob/master/python/pyspark/sql/connect/client/core.py#L597)
```python
class SparkConnectClient(object):
```

This is a part of the following.
- #1

### Why are the changes needed?

To use `gRPC` in the upper `SparkSession` and `DataFrame` layers easily.

### Does this PR introduce _any_ user-facing change?

No, this is not released.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #7 from dongjoon-hyun/SPARK-51472.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Mar 12, 2025

I'm closing this PR because all contents are merged via the following

I'm moving forward to

  • Adding more test cases for various data types and clarify the support matrix of types.
  • Adding missing features.
  • Polishing the implementation .
  • Integrating with Swift Package Index.
  • Improving CIs by enabling the Spark Connect server via Docker in GitHub Action CIs.

The first MVP (Minimum Viable Product) is focusing on SQL area of Apache Spark 4.0.0 including the following:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant